377 research outputs found

    CpG islands or CpG clusters: how to identify functional GC-rich regions in a genome?

    Get PDF
    Background CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5\u27 end of genes and considered gene markers. Hackenberg et al. (2006) recently developed a new algorithm, CpGcluster, which uses a completely different mathematical approach from previous traditional algorithms. Their evaluation suggests that CpGcluster provides a much more efficient approach to detecting functional clusters or islands of CpGs. Results We systematically compared CpGcluster with the traditional algorithm by Takai and Jones (2002). Our comparisons of (1) the number of islands versus the number of genes in a genome, (2) the distribution of islands in different genomic regions, (3) island length, (4) the distance between two neighboring islands, and (5) methylation status suggest that Takai and Jones\u27 algorithm is overall more appropriate for identifying promoter-associated islands of CpGs in vertebrate genomes. Conclusion The generation of genome sequence and DNA methylation data is expected to accelerate greatly. The information in this study is important for its extensive utility in gene feature analysis and epigenomics including gene prediction and methylation chip design in different genomes

    Fast and Space-Efficient Location of Heavy or Dense Segments in Run-Length Encoded Sequences

    Get PDF
    This paper considers several variations of an optimization problem with potential applications in such areas as biomolecular sequence analysis and image processing. Given a sequence of items, each with a weight and a length, the goal is to find a subsequence of consecutive items of optimal value, where value is either total weight or total weight divided by total length. There may also be a specified lower and/or upper bound on the acceptable length of subsequences. This paper shows that all the variations of the problem are solvable in linear time and space even with non-uniform item lengths and divisible items, implying that run-length encoded sequences can be handled in time and space linear in the number of runs. Furthermore, some problem variations can be solved in constant space. Also, these time and space bounds suffice for certain problem variations in which we call for reporting of many “good” subsequences

    WordCluster: detecting clusters of DNA words and genomic elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many <it>k-</it>mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds.</p> <p>Results</p> <p>We introduce here an algorithm to detect clusters of DNA words (<it>k-</it>mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used <it>WordCluster </it>to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome.</p> <p>Conclusions</p> <p><it>WordCluster </it>seems to predict biological meaningful clusters of DNA words (<it>k-</it>mers) and genomic entities. The implementation of the method into a web server is available at <url>http://bioinfo2.ugr.es/wordCluster/wordCluster.php</url> including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.</p

    Knowledge sharing and collaboration in translational research, and the DC-THERA Directory

    Get PDF
    Biomedical research relies increasingly on large collections of data sets and knowledge whose generation, representation and analysis often require large collaborative and interdisciplinary efforts. This dimension of ‘big data’ research calls for the development of computational tools to manage such a vast amount of data, as well as tools that can improve communication and access to information from collaborating researchers and from the wider community. Whenever research projects have a defined temporal scope, an additional issue of data management arises, namely how the knowledge generated within the project can be made available beyond its boundaries and life-time. DC-THERA is a European ‘Network of Excellence’ (NoE) that spawned a very large collaborative and interdisciplinary research community, focusing on the development of novel immunotherapies derived from fundamental research in dendritic cell immunobiology. In this article we introduce the DC-THERA Directory, which is an information system designed to support knowledge management for this research community and beyond. We present how the use of metadata and Semantic Web technologies can effectively help to organize the knowledge generated by modern collaborative research, how these technologies can enable effective data management solutions during and beyond the project lifecycle, and how resources such as the DC-THERA Directory fit into the larger context of e-science

    Inheritance of an Epigenetic Mark: The CpG DNA Methyltransferase 1 Is Required for De Novo Establishment of a Complex Pattern of Non-CpG Methylation

    Get PDF
    Site-specific methylation of cytosines is a key epigenetic mark of vertebrate DNA. While a majority of the methylated residues are in the symmetrical (meC)pG:Gp(meC) configuration, a smaller, but significant fraction is found in the CpA, CpT and CpC asymmetric (non-CpG) dinucleotides. CpG methylation is reproducibly maintained by the activity of the DNA methyltransferase 1 (Dnmt1) on the newly replicated hemimethylated substrates (meC)pG:GpC. On the other hand, establishment and hereditary maintenance of non-CpG methylation patterns have not been analyzed in detail. We previously reported the occurrence of site- and allele-specific methylation at both CpG and non-CpG sites. Here we characterize a hereditary complex of non-CpG methylation, with the transgenerational maintenance of three distinct profiles in a constant ratio, associated with extensive CpG methylation. These observations raised the question of the signal leading to the maintenance of the pattern of asymmetric methylation. The complete non-CpG pattern was reinstated at each generation in spite of the fact that the majority of the sperm genomes contained either none or only one methylated non-CpG site. This observation led us to the hypothesis that the stable CpG patterns might act as blueprints for the maintenance of non-CpG DNA methylation. As predicted, non-CpG DNA methylation profiles were abrogated in a mutant lacking Dnmt1, the enzymes responsible for CpG methylation, but not in mutants defective for either Dnmt3a or Dnmt2

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    Particle Swarm Optimization with Reinforcement Learning for the Prediction of CpG Islands in the Human Genome

    Get PDF
    BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome

    Comparative analysis of sequence characteristics of imprinted genes in human, mouse, and cattle

    Get PDF
    Genomic imprinting is an epigenetic mechanism that results in monoallelic expression of genes depending on parent-of-origin of the allele. Although the conservation of genomic imprinting among mammalian species has been widely reported for many genes, there is accumulating evidence that some genes escape this conservation. Most known imprinted genes have been identified in the mouse and human, with few imprinted genes reported in cattle. Comparative analysis of genomic imprinting across mammalian species would provide a powerful tool for elucidating the mechanisms regulating the unique expression of imprinted genes. In this study we analyzed the imprinting of 22 genes in human, mouse, and cattle and found that in only 11 was imprinting conserved across the three species. In addition, we analyzed the occurrence of the sequence elements CpG islands, C + G content, tandem repeats, and retrotransposable elements in imprinted and in nonimprinted (control) cattle genes. We found that imprinted genes have a higher G + C content and more CpG islands and tandem repeats. Short interspersed nuclear elements (SINEs) were notably fewer in number in imprinted cattle genes compared to control genes, which is in agreement with previous reports for human and mouse imprinted regions. Long interspersed nuclear elements (LINEs) and long terminal repeats (LTRs) were found to be significantly underrepresented in imprinted genes compared to control genes, contrary to reports on human and mouse. Of considerable significance was the finding of highly conserved tandem repeats in nine of the genes imprinted in all three species

    A Meta-Analysis of Microarray Gene Expression in Mouse Stem Cells: Redefining Stemness

    Get PDF
    While much progress has been made in understanding stem cell (SC) function, a complete description of the molecular mechanisms regulating SCs is not yet established. This lack of knowledge is a major barrier holding back the discovery of therapeutic uses of SCs. We investigated the value of a novel meta-analysis of microarray gene expression in mouse SCs to aid the elucidation of regulatory mechanisms common to SCs and particular SC types.We added value to previously published microarray gene expression data by characterizing the promoter type likely to regulate transcription. Promoters of up-regulated genes in SCs were characterized in terms of alternative promoter (AP) usage and CpG-richness, with the aim of correlating features known to affect transcriptional control with SC function. We found that SCs have a higher proportion of up-regulated genes using CpG-rich promoters compared with the negative controls. Comparing subsets of SC type with the controls a slightly different story unfolds. The differences between the proliferating adult SCs and the embryonic SCs versus the negative controls are statistically significant. Whilst the difference between the quiescent adult SCs compared with the negative controls is not. On examination of AP usage, no difference was observed between SCs and the controls. However, comparing the subsets of SC type with the controls, the quiescent adult SCs are found to up-regulate a larger proportion of genes that have APs compared to the controls and the converse is true for the proliferating adult SCs and the embryonic SCs.These findings suggest that looking at features associated with control of transcription is a promising future approach for characterizing “stemness” and that further investigations of stemness could benefit from separate considerations of different SC states. For example, “proliferating-stemness” is shown here, in terms of promoter usage, to be distinct from “quiescent-stemness”
    corecore